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1 TextTilinq: segmenting text into multi-paragraph subtopic passages 
Marti A. Hearst 

March 1997 Computational Linguistics, volume 23 issue i 
Publisher: MIT Press 
Full text available: 



• ^pdf(2.46MB) 
Publisher Site 



Additional Information: full citation , abstract , references , citings 



TextTiling is a technique for subdividing texts into multi-paragraph units that represent 
passages, or subtopics. The discourse cues for identifying major subtopic shifts are 
patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and 
is shown to produce segmentation that corresponds well to human judgments of the 
subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful 
for many text analysis tasks, including information retrieval and ... 

2 Adapting content to mobile devices: Fractal summarization for mobile devices to 
^ access large documents on the web 
^ Christopher C. Yang, Fu Lee Wang 

May 2003 Proceedings of the 12th international conference on World Wide Web 

Publisher: ACM Press 

Full text available: ^ pdf(317.55 KB) Additional Information: full citation , abstract , references , index terms 

Wireless access with mobile (or handheld) devices is a promising addition to the WWW and 
traditional electronic business. Mobile devices provide convenience and portable access to 
the huge information space on the Internet without requiring users to be stationary with 
network connection. However, the limited screen size, narrow network bandwidth, small 
memory capacity and low computing power are the shortcomings of handheld devices. 
Loading and visualizing large documents on handheld devices bee ... 

Keywords: document summarization, fractal summarization, handheld devices, mobile 
commerce 



3 Industrial and practical experience track paper session 1: Ranking definitions with 
supervised learning methods 
Jun Xu, Yunbo Cao, Hang Li, Min Zhao 
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This paper is concerned with the problem of definition search. Specifically, given a term, 
we are to retrieve definitional excerpts of the term and rank the extracted excerpts 
according to their likelihood of being good definitions. This is in contrast to the traditional 
approaches of either generating a single combined definition or simply outputting all 
retrieved definitions. Definition ranking is essential for the task. Methods for performing 
definition ranking are proposed in this paper, whi ... 

Keywords: classification, ordinal regression, search of definitions, text mining, web 
mining, web search 



Passage-level evidence in document retrieval 
James P. Callan 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: Springer-Verlag New York, Inc. 

Full text available: f£| pdf(8Q5,57 KB) Additional ,nformation - Mcitatign, references , citings, index terms , 
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Summarization: Cross-document summarization by concept classification 

Hilda Hardy, Nobuyuki Shimizu, Tomek Strzalkowski, Liu Ting, Xinyang Zhang, G. Bowden 

Wise 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available' ^ pdf(246.03 KB) Additional Information: full citation, abstract, references , citings, index 

terms 

In this paper we describe a Cross Document Summarizer XDoX designed specifically to 
summarize large document sets (50-500 documents and more). Such sets of documents 
are typically obtained from routing or filtering systems run against a continuous stream of 
data, such as a newswire. XDoX works by identifying the most salient themes within the 
set (at the granularity level that is regulated by the user) and composing an extraction 
summary, which reflects these main themes. In the current version, ... 

Keywords: clustering, multi-document summarization, n-grams, passage similarity, 
summary, term weights 
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Abstracting of legal cases: the SALOMON experience 
Marie-Francine Moens, Caroline Uyttendaele, Jos Dumortier 

June 1997 Proceedings of the 6th international conference on Artificial intelligence 
and law 
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8 Web Clustering, filtering and applications: Narrative text classification for automatic Q 

key phrase extraction in web document corpora 
Yongzheng Zhang, Nur Zincir-Heywood, Evangelos Milios 

November 2005 Proceedings of the 7th annual ACM international workshop on Web 
information and data management WIDM '05 

Publisher: ACM Press 

Full text available: ^ pdfd 84.38 KB) Additional Information: full citation , abstract , references , index terms 

Automatic key phrase extraction is a useful tool in many text related applications such as 
clustering and summarization. State-of-the-art methods are aimed towards extracting key 
phrases from traditional text such as technical papers. Application of these methods on 
Web documents, which often contain diverse and heterogeneous contents, is of particular 
interest and challenge in the information age. In this work, we investigate the significance 
of narrative text classification in the task of auto ... 

Keywords: acceptable percentage, key phrase extraction, narrative text classification 




Multidocument summarization: An added value to clustering in interactive retrieval 

Manuel J. Mana-Lopez, Manuel De Buenaga, Jose M. Gomez-Hidalgo 

April 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 2 

Publisher: ACM Press 

Full text available- f" 1 ) pdf(1 99 91 KB) Addltlonal Information: full citation , abstract , references , index terms . 
• [ajh s review 

A more and more generalized problem in effective information access is the presence in 
the same corpus of multiple documents that contain similar information. Generally, users 
may be interested in locating, for a topic addressed by a group of similar documents, one 
or several particular aspects. This kind of task, called instance or aspectual retrieval, has 
been explored in several TREC Interactive Tracks. In this article, we propose in addition to 
the classification capacity of clustering techn ... 

Keywords: Multidocument summarization, topic segmentation 



10 Selective text utilization and text traversal 
Gerard Salton, James Allen 

December 1993 Proceedings of the fifth ACM conference on Hypertext 

Publisher: ACM Press 

Full text available: ^pdf(1.2QMB) Additional Information: full citation , references , citing s, index terms 



Keywords: automatic text linking, full-text access, global text comparisons, information 
retrieval, local context checking, passage retrieval, selective text reading, text analysis, 
text summarization 
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Publisher: ACM Press 

Full text available: fi Qpdf(955.73 KB) Additional Information: full citation , references , citings , index terms 
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Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^ pdf(830.49 KB) Additional Information: full citation , abstract , references 

ANNOD is the name of a system developed at the National Library of Medicine (NLM), 
which implements a set of linguistic and empirical techniques that permit retrieval of 
natural language information in response to natural language queries. The system is based 
on Dr. Gerard Salton's SMART [1] document retrieval system and is presently 
implemented on a mini-computer as part of an Interactive TExt Management System, 
ITEMS. [2] Actual experience with retrieval of information from NLM's Hepatitis ... 

13 Efficient web browsing on handheld devices using page and form summarization 

>gv January 2002 ACM Transactions on Information Systems (TOIS), Volume 20 issue l 

^ Publisher: ACM Press 

Full text available:- ^ Pdf(4.47 MB) Mm °" a] ,nformation: citation , references, citings, index 

^^-^ terms , review 

We present a design and implementation for displaying and manipulating HTML pages on 
small handheld devices such as personal digital assistants (PDAs), or cellular phones. We 
introduce methods for summarizing parts of Web pages and HTML forms. Each Web page 
is broken into text units that can each be hidden, partially displayed, made fully visible, or 
summarized. A variety of methods are introduced that summarize the text units. In 
addition, HTML forms are also summarized by displaying just the t ... 

Keywords: PDA, Personal digital assistant, WAP, WML, forms, handheld computers, 
mobile computing, summarization, ubiquitous computing, wireless computing 



14 Summarization-based query expansion in information retrieval 
Tomek Strzalkowski, Jin Wang, Bowden Wise 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 2 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 2 

Publisher: Association for Computational Linguistics , Association for Computational Linguistics 

Full text available: f3 pdf(726.07 KB) 

Jsj Additional Information: full citation , abstract , references 

ty p Publisher Site 

We discuss a semi-interactive approach to information retrieval which consists of two tasks 
performed in a sequence. First, the system assists the searcher in building a 
comprehensive statement of information need, using automatically generated topical 
summaries of sample documents. Second, the detailed statement of information need is 
automatically processed by a series of natural language processing routines in order to 
derive an optimal search query for a statistical information retrieval sys ... 

15 Complete formal model for information retrieval systems 
Jean Tague, Airi Salminen, Charles McClellan 

September 1991 Proceedings of the 14th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^pdf(718.35 KB) Additional Information: full citation , references , citings , index terms 
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August 1999 ACM SIGDOC Asterisk Journal of Computer Documentation, Volume 23 issue 

3 

Publisher: ACM Press 
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17 The effects of analysing cohesion on document summarisation 
Branimir K. Boguraev, Mary S. Neff 

July 2000 Proceedings of the 18th conference on Computational linguistics - Volume 
1 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(853.62 KB) Additional Information: full citation , abstract , references 

We argue that in general, the analysis of lexical cohesion factors in a document can drive a 
summarizer, as well as enable other content characterization tasks. More narrowly, this 
paper focuses on how one particular cohesion factor-simple lexical repetition— can 
enhance an existing sentence extraction summarizer, by enabling strategies for 
overcoming some particularly jarring end-user effects in the summaries, typically due to 
coherence degradation, readability deterioration, and topical unde ... 

18 Links for a better web: Enhanced web document summarization using hyperlinks 
J.-Y. Delort, B. Bouchon-Meunier, M. Rifqi 

August 2003 Proceedings of the fourteenth ACM conference on Hypertext and 
hypermedia 

Publisher: ACM Press 

Full text available:f g|pdf(167.88 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

This paper addresses the issue of Web document summarization. As textual content of 
Web documents is often scarce or irrelevant and existing summarization techniques are 
based on it, many Web pages and websites cannot be suitably summarized. We consider 
the context of a Web document by the textual content of all the documents linking to it. To 
summarize a target Web document, a context-based summarizer has to perform a 
preprocessing task, during which it will be decided which pieces of informati ... 

Keywords: context, hyperlinks, summarization, web document 
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April 2001 Proceedings of the 10th international conference on World Wide Web 

Publisher: ACM Press 
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Keywords: PDA, WAP, handheld computers, mobile computing, personal digital assistant, 
summarization, ubiquitous computing, wireless computing 
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^ and comparing web pages 
^ Akiyo Nadamoto, Katsumi Tanaka 

May 2003 Proceedings of the 12th international conference on World Wide Web 

Publisher: ACM Press 

Full text available: tgodf(230.19 KB) AdditionaI '"formation: Ration , abstract, references , citings, index 

In this paper, we propose a new type of Web browser, called the Comparative Web 
Browser(CWB), which concurrently presents multiple Web pages in a way that enables the 
content of the Web pages to be automatically synchronized. The ability to view multiple 
Web pages at one time is useful when we wish to make a comparison on the Web, such as 
when we compare similar products or news articles from different newspapers. The CWB is 
characterized by (1) automatic content-based retrieval of passages from ... 
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personalized search and summarization in a medical digital library 
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May 2003 Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries 

Publisher: IEEE Computer Society 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: ^ pdfd 16.18 KB) 



Despite the large amount of online medical literature, it can be difficult for clinicians to find 
relevant information at the point of patient care. In this paper, we present techniques to 
personalize the results of search, making use of the online patient record as a 
sophisticated, pre-existing user model. Our work in PERSIVAL, a medical digital library, 
includes methods for re-ranking the results of search to prioritize those that better match 
the patient record. It also generates summa ... 

22 A loosely-coupled integration of a text retrieval system and an object-oriented 
^ database system 

^ W. Bruce Croft, Lisa A. Smith, Howard R. Turtle 

June 1992 Proceedings of the 15th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available- fs3 pdf(854 18 KB) AdditionaI Information: full citation , abstract , references , citings , index 
* 1 terms 

Document management systems are needed for many business applications. This type of 
system would combine the functionality of a database system, (for describing, storing and 
maintaining documents with complex structure and relationships) with a text retrieval 
system (for effective retrieval based on full text). The retrieval model for a document 
management system is complicated by the variety and complexity of the objects that are 
represented. In this paper, we describe an approach to compl ... 



23 The rhetorical parsing of unrestricted texts: a surface-based approach 
Daniel Marcu 

September 2000 Computational Linguistics, Volume 26 issue 3 
Publisher: MIT Press 

Full text available:^* rf| 

Tgjpdt(3.87 MB) *W Additional Information: full citation , abstract , references 

Publisher Site 

Coherent texts are not just simple sequences of clauses and sentences, but rather 
complex artifacts that have highly elaborate rhetorical structure. This paper explores the 
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extent to which well-formed rhetorical structures can be automatically derived by means 
of surface-form-based algorithms. These algorithms identify discourse usages of cue 
phrases and break sentences into clauses, hypothesize rhetorical relations that hold 
among textual units, and produce valid rhetorical structure trees for ... 

24 Description and Analysis: Using web structure for classifying and describing web 
pages 

^ Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, Gary W. Flake 
May 2002 Proceedings of the 11th international conference on World Wide Web 

Publisher: ACM Press 

r- ,. * i ui 0 j« 4 oenvm Additional Information: full citation , abstract, references , citings, index 
Full text available: ^ jpdf(136.12 KB) farms 

The structure of the web is increasingly being used to improve organization, search, and 
analysis of information on the web. For example, Google uses the text in citing documents 
(documents that link to the target document) for search. We analyze the relative utility of 
document text, and the text in citing documents near the citation, for classification and 
description. Results show that the text in citing documents, when available, often has 
greater discriminative and descriptive power than th ... 

Keywords: SVM, anchortext, classification, cluster naming, entropy based feature 
extraction, evaluation, web directory, web structure 



25 Cross-lingual C*ST*RD: English access to Hindi information B 
Anton Leuski, Chin-Yew Lin, Liang Zhou, Ulrlch Germann, Franz Josef Och, Eduard Hovy 

V September 2003 ACM Transactions on Asian Language Information Processing 
(TALIP), Volume 2 Issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(210.61 KB) Additional Information: full citation , abstract , references , index terms 

We present C*ST*RD, a cross-language information delivery system that supports cross- 
language information retrieval, information space visualization and navigation, machine 
translation, and text summarization of single documents and clusters of documents. 
C*ST*RD was assembled and trained within 1 month, in the context of DARPA's Surprise 
Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given 
the brief time, we could not create deep Hindi capabilities for all th ... 

Keywords: Cross-language information retrieval, Hindi-to-English machine translation, 
headline generation, information retrieval and information space navigation, single- and 
multi-document text summarization 



26 Fast detection of communication patterns in distributed executions | 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Adva need 

Studies on Collaborative research 
Publisher: IBM Press 

Full text available: gpdf(4,21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the execution 
of the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not 
provide the user with the desired overview of the application. In our experience, such tools 
display repeated occurrences of non-trivial commun ... 

27 Creating and evaluating multi-document sentence extract summaries 
/s^ Jade Goldstein, Vibhu Mittal, Jaime Carbonell, Jamie Callan 

V November 2000 Proceedings of the ninth international conference on Information and 
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knowledge management 

Publisher: ACM Press 

Full text available: ^ pdf(186.89 KB) Additional Information: full citation , references , citings , index terms 



28 A system for discovering relationships by feature extraction from text databases 
Jack G. Conrad, Mary Hunter Utt 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: Springer-Verlag New York, Inc. 

_ ii * * .» 0 , f/m , 7C , m Additional Information: full citation , references , citings , index terms . 

Full text available: fH pdf(911.76 KB) — ; 
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29 HyPursuit: a hierarchical network search en g ine that exploits content-link h ypertext 
<g> clustering 

^ Ron Weiss, Bienvenido Velez, Mark A. Sheldon 

March 1996 Proceedings of the the seventh ACM conference on Hypertext 

Publisher: ACM Press 

Full text available: ^) pdf(2.00 MB) Additional Information: full citation , references , citings , index terms 



30 XML retrieval: Configurable indexing and ranking for XML information retrieval j 
A Shaorong Liu, Qinghua Zou, Wesley W. Chu 

V July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '04 

Publisher: ACM Press 

r- ii * ^ -i ui 0 ^/oco oo va\ Additional Information: full citation , abstract , references , citings , index 
Full text available: TO pdf(362.38 KB) 

l *- r terms 

Indexing and ranking are two key factors for efficient and effective XML information 
retrieval. Inappropriate indexing may result in false negatives and false positives, and 
improper ranking may lead to low precisions. In this paper, we propose a configurable XML 
information retrieval system, in which users can configure appropriate index types for XML 
tags and text contents. Based on users' index configurations, the system transforms XML 
structures into a compact tree representation, Ctree, and ... 

Keywords: XML indexing, XML information retrieval, XML ranking, ranking 



31 S ystems: University of Durham: description of the LOLITA system as used in MUC-6 Q 
Richard Morgan, Roberto Garigliano, Paul Callaghan, Sanjay Poria, Mark Smith, Chris Cooper 
November 1993 Proceedings of the 6th conference on Message understanding MUC6 
•95 

Publisher: Association for Computational Linguistics 

Full text available: *g| pdf(1.13 MB) Additional Information: full citation , abstract , references 

This document describes the LOLITA system and how it was extended to run the four MUC 
tasks, discusses the resulting system's performance on the required "walk-through" 
article, and then considers the performance of this system on the final evaluation set. 



32 Summarization: Topic themes for multi-document summarization 
Sanda Harabagiu, Finley Lacatusu 

August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '05 

Publisher: ACM Press 

Full text available: ^ pdf(245.85 KB) Additional Information: full citation , abstract , references , index terms 
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The problem of using topic representations for multi-document summarization (MDS) has 
received considerable attention recently. In this paper, we describe five different topic 
representations and introduce a novel representation of topics based on topic themes. We 
present eight different methods of generating MDS and evaluate each of these methods on 
a large set of topics used in past DUC workshops. Our evaluation results show a significant 
improvement in the quality of summaries based on topic ... 

Keywords: summarization, topic themes 



33 Getting and giving information: What is this text about? 

❖ Nicolas Hernandez, Brigitte Grau 
October 2003 Proceedings of the 21st annual international conference on 

Documentation 
Publisher: ACM Press 

Full text available:^ pdf(229. 96 KB) Additional Information: full citation , abstract , references , index terms 

Most work in text retrieval aims at presenting the information held by several texts in 
order to give entry clues towards these texts and to allow a navigation between them. 
Besides, a lesser interest is dedicated to the definition of principles for accessing content 
of single documents. As most information retrieval systems return documents from an 
initial request made of words, a usual solution consists of presenting document titles and 
highlighting words of the request inside a passage or in ... 

Keywords: dynamic summarization, meta -descriptors and topical descriptors 
identification, text structure, text visualization 



34 Query in g structured documents with hypertext links using OODBMS 
V. Christophides, A. Rizk 

September 1994 Proceedings of the 1994 ACM European conference on Hypermedia 
technology 

Publisher: ACM Press 

i- 1. 4 ^ i ui a ,, H ooyD s Additional Information: full citation , abstract , references , citings, index 

Full text available: TO pdf(1,32 MB) 

u-t terms 

Hierarchical logical structure and hypertext links are complementary and can be combined 
to build more powerful document management systems. Previous work exploits this 
complementarity for building better document processors, browsers and editing tools, but 
not for building sophisticated querying mechanisms. Querying in hypertext has been a 
requirement since [19] and has already been elaborated in many hypertext systems, but 
has not yet been used for hypertext systems superimposed on an und ... 

Keywords: hypertexts, information retrieval, object oriented databases, path 
expressions, query languages, structured documents 



35 Assistive technologies for individuals with visual impairments 1: Gist summaries for ||| 

visually impaired surfers 
^ Simon Harper, Neha Patel 

October 2005 Proceedings of the 7th international ACM SIGACCESS conference on 
Computers and accessibility Assets '05 

Publisher: ACM Press 

Full text available: ^ pdf(2.19 MB) Additional Information: full citation , abstract , references , index terms 

Anecdotal evidence suggests that Web document summaries provide the sighted reader 
with a basis for making decisions regarding the route to take within non-linear text; and 
additional research shows that sighted people use 'Gist' summaries as decision points to 
bolster their browsing behaviour. Other studies have found that visually impaired users are 
hindered in their cognition of the content of Web-pages because users must wait for an 
entire Web-page to be read before deciding on it's usefulne ... 
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Keywords: document engineering, tools, visual impairment, web 



36 Text classification: Web-page classification through summarization 

Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma 
July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 

Research and development in information retrieval SIGIR '04 
Publisher: ACM Press 

Full text available: « pdf(225.80 KB) Additjonal Information: full citation , abstract, references , index terms , 
^ review 

Web-page classification is much more difficult than pure-text classification due to a large 
variety of noisy information embedded in Web pages. In this paper, we propose a new 
Web-page classification algorithm based on Web summarization for improving the 
accuracy. We first give empirical evidence that ideal Web-page summaries generated by 
human editors can indeed improve the performance of Web-page classification algorithms. 
We then propose a new Web summarization-based classification algorithm ... 

Keywords: content body, web page categorization, web page summarization 



37 1a— Links and Navigation: The look of the link - concepts for the user interface of 
extended hyperlinks 

Harald Weinreich, Hartmut Obendorf, Winfried Lamersdorf 

September 2001 Proceedings of the twelfth ACM conference on Hypertext and 
Hypermedia 

Publisher: ACM Press 

Full text available: ^ pdf(307,01 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms 

The design of hypertext systems has been subject to intense research. Apparently, one 
topic was mostly neglected: how to visualize and interact with link markers. 

This paper presents an overview of pragmatic historical approaches, and discusses 
problems evolving from sophisticated hypertext linking features. Blending the potential of 
an XLink-enhanced Web with old ideas and recent GUI techniques, a vision for browser 
link interfaces of the future is being developed. We hope to stimula ... 

Keywords: Web, XLink, distributed hypertext, link marker, user interface 



38 Case-based reasoning: Automatic summarisation of legal documents 
A Claire Grover, Ben Hachey, Ian Hughson, Chris Korycinski 

>^ June 2003 Proceedings of the 9th international conference on Artificial intelligence 
and law 

Publisher: ACM Press 

Full text available: ^ | pdf(323.69 KB) Additional Information: full citation , abstract , references 

We report on the SUM project which applies automatic summarisation techniques to the 
legal domain. We describe our methodology whereby sentences from the text are 
classified according to their rhetorical role in order that particular types of sentence can be 
extracted to form a summary. We describe some experiments with judgments of the 
House of Lords: we have performed automatic linguistic annotation of a small sample set 
and then hand-annotated the sentences in the set in order to explore the ... 

39 Search 1 : Expert agreement and content based reranking in a meta search 
<g> environment using Mearf 

^ B. Uygar Oztekin, George Karypis, Vipin Kumar 

May 2002 Proceedings of the 11th international conference on World Wide Web 
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Publisher: ACM Press 
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Full text available: pgf(509.92 Kd) terms 

Recent increase in the number of search engines on the Web and the availability of meta 
search engines that can query multiple search engines makes it important to find effective 
methods for combining results coming from different sources. In this paper we introduce 
novel methods for reranking in a meta search environment based on expert agreement 
and contents of the snippets. We also introduce an objective way of evaluating different 
methods for ranking search results that is based upon implici ... 

Keywords: collection fusion, expert agreement, merging, meta search, reranking 

40 Summarization: Generic summarization and keyphrase extraction using mutual 
^ reinforcement principle and sentence clustering 
^ Hongyuan Zha 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACM Press 

c n* ^ , ., a , Hn<1 n ,, m Additional Information: full citation , abstract, references , citings, index 
Full text available: TO pdf(191.05 KB) - 

^ terms 

A novel method for simultaneous keyphrase extraction and generic text summarization is 
proposed by modeling text documents as weighted undirected and weighted bipartite 
graphs. Spectral graph clustering algorithms are useed for partitioning sentences of the 
documents into topical groups with sentence link priors being exploited to enhance 
clustering quality. Within each topical group, saliency scores for keyphrases and sentences 
are generated based on a mutual reinforcement principle. The ... 

Keywords: bipartite graph, graph partitioning, keyphrase extraction, mutual 
reinforcement principle, singular value decomposition, text summarization 
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41 DOCUMENTS: an interactive online solution to four documentation problems 
T. R. Girill, Clement H. Lulc 

May 1983 Communications of the ACM, Volume 26 issue 5 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available:^ pdf(1. 14 MB) 



An adequate delivery system for user documentation addresses the problems of easy 
access, versatile publication, convenient administration, and good document quality. At 
the National Magnetic Fusion Energy Computer Center the DOCUMENT program helps 
solve these problems by providing a high level of service through strategies that can 
readily be exported to other contexts. Dividing machine-readable documents into keyword 
windows permits fully online, subject-oriented ... 

Keywords: help packages, information retrieval, keywords, online catalogs, user 
assistance, user interfaces 



42 Information retrieval: Learning from relevant documents in large scale routing 
retrieval 

K. L. Kwok, L. Grunfeld 

March 1994 Proceedings of the workshop on Human Language Technology HLT *94 
Publisher: Association for Computational Linguistics 

Full text available: ^| pdf(557.01 KB) Additional Information: full citation , abstract , references 

The normal practice of selecting relevant documents for training routing queries is to 
either use all relevants or the 'best n' of them after a (retrieval) ranking operation with 
respect to each query, Using all relevants can introduce noise and ambiguities in training 
because documents can be long with many irrelevant portions. Using only the 'best n' risks 
leaving out documents that do not resemble a query. Based on a method of segmenting 
documents into more uniform size subdocuments, a better ... 

43 IRIS hypermedia services 
Bernard J. Haan, Paul Kahn, Victor A. Riley, James H. Coombs, Norman K. Meyrowitz 
January 1992 Communications of the ACM, volume 35 issue l 
Publisher: ACM Press 

Full text available: Q Pdf(5.66MB) Addltional Information: full citation , references , citings, index terms , 
^ review 
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Keywords: IRIS hypermedia services, hypermedia, hypertext, intermedia 



44 Designing theory-based systems: a case study 

#John B. Smith, Marcy Lansman 
June 1992 Proceedings of the SIGCHI conference on Human factors in computing 

systems 
Publisher: ACM Press 

Full text available: ^ |pdf(1.19 MB) Additional Information: full citation , abstract , references , index terms 

In this paper, we discuss principles for designing and testing computer systems intended 
to support users' thinking as they perform open-ended or ill-defined tasks. We argue that 
such systems inherently and inevitably implement a model of users' cognitive behaviors. 
Making that model explicit can provide system developers with guidance in taking design 
decisions. However, both model and system must be tested and refined. We discuss these 
principles in relation to a case study in which our g ... 

Keywords: cognitive models, cognitive modes and strategies, system design, task 
analysis, user testing 



45 Buildin g efficient and effective metasearch engines 
Weiyi Meng, Clement Yu, King-Lup Liu 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue l 
Publisher: ACM Press 

r- .. ♦ * , a MA4*rvTt*B\ Additional Information: full citation , abstract , references , citings , index 
Full text available: ^ pdf(416.07 KB) terms 

Frequently a user's information needs are stored in the databases of multiple search 
engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search 
engines and identify useful documents from the returned results. To support unified access 
to multiple search engines, a metasearch engine can be constructed. When a metasearch 
engine receives a query from a user, it invokes the underlying search engines to retrieve 
useful information for the user. Metasearch engines have ... 

Keywords: Collection fusion, distributed collection, distributed information retrieval, 
information resource discovery, metasearch 



4 6 Document structure and content analysis 1: Structuring documents according to their Q 

table of contents 
Herve Dejean, Jean-Luc Meunier 

November 2005 Proceedings of the 2005 ACM symposium on Document engineering 
DocEng '05 

Publisher: ACM Press 

Full text available: ^| pdf(544.22 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we present a method for structuring a document according to the 
information present in its Table of Contents. The detection of the ToC as well as the 
determination of the parts it refers to in the document body rely on a series of generic 
properties characterizing any ToC, while its hierarchization is achieved using clustering 
techniques. We also report on the robustness and performance of the method before 
discussing it, in light of related work. 

Keywords: document structuring, table of contents recognition 




4 ? Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism ^ 

#Dascal Vishkin, Uzi Vishkin 
December 2000 Journal of Experimental Algorithmics (JEA), Volumes 



http: //portal . acm.org/results. cfm?query=SUMMARY%20%3CPARAGRAPH%3E%. . . 2/3/06 



Results (page 3): summary <paragraph> link <paragraph> docume... Page 3 of 6 



Publisher: ACM Press 

Full text available: g] pdf(347.52 KB) Additional Information: full citation , abstract , references , citings , index 
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Algorithms for the problem of list ranking are empirically studied with respect to the 
Explicit Multi-Threaded (XMT) platform for instruction-level parallelism (ILP). The main 
goal of this study is to understand the differences between XMT and more traditional 
parallel computing implementation platforms/models as they pertain to the well studied 
list ranking problem. The main two findings are: (i) good speedups for much smaller 
inputs are possible and (ii) in part, the first finding i ... 

48 Special issue on knowledge representation 
Ronald J. Brachman, Brian C. Smith 
February 1980 ACM SIGART Bulletin, issue 70 

Publisher: ACM Press 

Full text available: *^ pdf(13.13 MB) Additional Information: full citation , abstract 

In the fall of 1978 we decided to produce a special issue of the SIGART Newsletter devoted 
to a survey of current knowledge representation research. We felt that there were twe 
useful functions such an issue could serve. First, we hoped to elicit a clear picture of how 
people working in this subdiscipline understand knowledge representation research, to 
illuminate the issues on which current research is focused, and to catalogue what 
approaches and techniques are currently being developed. Secon ... 

4 9 What makes the differences: benchmarking XML database implementations 
Hongjun Lu, Jeffrey Xu Yu, Guoren Wang, Shihui Zheng, Haifeng Jiang, Ge Yu, Aoying Zhou 
February 2005 ACM Transactions on Internet Technology (TOIT), volume 5 issue l 

Publisher: ACM Press 

Full text available: ^ pdf(589.14 KB) Additional Information: full citation , abstract , references , index terms 

XML is emerging as a major standard for representing data on the World Wide Web. 
Recently, many XML storage models have been proposed to manage XML data. In order to 
assess an XML databases abilities to deal with XML queries, several benchmarks have also 
been proposed, including XMark and XMach. However, no reported studies using those 
benchmarks were found that can provide users with insights on the impacts of a variety of 
storage models on XML query performance. In this article, we report our ... 

Keywords: XML query processing, XML storage model, benchmark 



50 Industry/government track paper: Deriving marketing intelligence from online 
^ discussion 

^ Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi 
Tomokiyo 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: ^ pdf(629.35 KB) Additional Information: full citation , abstract , references , index terms 

Weblogs and message boards provide online forums for discussion that record the voice of 
the public. Woven into this mass of discussion is a wide range of opinion and commentary 
about consumer products. This presents an opportunity for companies to understand and 
respond to the consumer by analyzing this unsolicited feedback. Given the volume, format 
and content of the data, the appropriate approach to understand this data is to use large- 
scale web and text data mining technologies.This paper ar ... 

Keywords: computational linguistics, content systems, information retrieval, machine 
learning, text mining 
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51 Topic-based browsing within a digital library using keyphrases B 
Steve Jones, Gordon Paynter 

August 1999 Proceedings of the fourth ACM conference on Digital libraries 

Publisher: ACM Press 

Full text available: ^pdf(266.18 KB) Additional Information: full citation , references , citings , index terms 



Keywords: automated hypertext generation, information exploration, information 
retrieval, keyphrase extraction 



52 S pecial issue: Al in engineering I 
A. D. Sriram, R. Joobbani 

>r April 1985 ACM SIGART Bulletin, issue 92 

Publisher: ACM Press 

Full text available: ^ pdf(8.79MB) Additional Information: full citation , abstract 

The papers in this special issue were compiled from responses to the announcement in the 
July 1984 issue of the SIGART newsletter and notices posted over the ARPAnet. The 
interest being shown in this area is reflected in the sixty papers received from over six 
countries. About half the papers were received over the computer network. 

53 H ypertext, full text, and automatic linking ! 
J. H. Coombs 

December 1989 Proceedings of the 13th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^ pdfd .46 MB) Additlonal lnformati ™ Mfltofion. abstract, references, citings, index 
^ terms 

Current computing systems typically support only mid-century information structures: 
simple hierarchies. Hypertext technologies enable users to impose many structures on 
document sets and, consequently, provide many paths to desired information, but they 
require that users work their way through some structure. Full-text search eliminates this 
requirement by ignoring structure altogether. The search strategy can also be restricted to 
work within specified contexts. The architecture provided ... 

54 Summarization and question answering: Using librarian techniques in automatic text | 
<g> summarization for information retrieval 

^ Min-Yen Kan, Judith L. Klavans 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 

Publisher: ACM Press 

Full text available:^ pdf(1.15 MB) Additional lnformation: Mcitation, abstract, references , citings, index 
^ terms 

A current application of automatic text summarization is to provide an overview of relevant 
documents coming from an information retrieval (IR) system. This paper examines how 
Centrifuser, one such summarization system, was designed with respect to methods used 
in the library community. We have reviewed these librarian expert techniques to assist 
information seekers and codified them into eight distinct strategies. We detail how we 
have operationalized six of these strategies in Centrifuser by c ... 

Keywords: automatic text summarization, information retrieval user interfaces, reference 
librarian techniques 



55 Search 2: Evaluating strategies for similarity search on the web 
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May 2002 Proceedings of the 11th international conference on World Wide Web 
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Finding pages on the Web that are similar to a query page (Related Pages) is an important 
component of modern search engines. A variety of strategies have been proposed for 
answering Related Pages queries, but comparative evaluation by user studies is expensive, 
especially when large strategy spaces must be searched (e.g., when tuning parameters). 
We present a technique for automatically evaluating strategies using Web hierarchies, 
such as Open Directory, in place of user feedback. We apply this ... 

Keywords: evaluation, open directory project, related pages, search, similarity search 
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This paper describes a novel tool, SmartSkim, for content-based browsing or skimming of 
documents. The tool integrates concepts from passage retrieval and from interfaces, such 
as TileBars, which provide a compact overview of query term hits within a document. We 
base our tool on the concept of relevance profiling, in which a plot of retrieval status 
values at each word position of a document is generated. A major contribution of this 
paper is applying language modelling to the task of relevance ... 

Keywords: browsing and reading appliances, e-books, information retrieval, language 
modeling, user interfaces, visualization 
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Structured document databases can be naturally viewed as derivation trees of a context- 
free grammar. Under this view, the classical formalism of attribute grammars becomes a 
formalism for structured document query languages. From this perspective, we study the 
expressive power of BAGs: Boolean-valued attribute grammars with propositional logic 
formulas as semantic rules, and RAGs: relation-valued attribute grammars with first-order 
logic formulas as semantic rules. BAGs can express only unary qu ... 
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Spoken dialogue systems allow users to interact with computer-based applications such as 
databases and expert systems by using natural spoken language. The origins of spoken 
dialogue systems can be traced back to Artificial Intelligence research in the 1950s 
concerned with developing conversational interfaces. However, it is only within the last 
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decade or so, with major advances in speech technology, that large-scale working systems 
have been developed and, in some cases, introduced into commerc ... 

Keywords: Dialogue management, human computer interaction, language generation, 
language understanding, speech recognition, speech synthesis 
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Automatic summarization of open domain spoken dialogues is a new research area. This 
paper introduces the task, the challenges involved, and presents an approach to obtain 
automatic extract summaries for multi-party dialogues of four different genres, without 
any restriction on domain. We address the following issues which are intrinsic to spoken 
dialogue summarization and typically can be ignored when summarizing written text such 
as newswire data: (i) detection and removal of speech disfl ... 
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We describe WEST, a WEb browser for Small Terminals, that aims to solve some of the 
problems associated with accessing web pages on hand-held devices. Through a novel 
combination of text reduction and focus+context visualization, users can access web pages 
from a very limited display environment, since the system will provide an overview of the 
contents of a web page even when it is too large to be displayed in its entirety. To make 
maximum use of the limited resources available on a typica ... 

Keywords: WAP (wireless application protocol), flip zooming, focus+context visualization, 
hand-held devices, proxy systems, text reduction, web browser 
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